PAROLE - 2011 - Annual activity report

PAROLE

PAROLE - 2011

Project Team Parole

Members

Overall Objectives

Scientific Foundations

Application Domains

Application Domains

Software

New Results

Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Contracts and Grants with Industry

National Contracts

ADT Handicom

An ADT (Action of Technological Development), was led from 2008 till 2010, managed by Agnès Piquard-Kipffer. The aim of this project is to provide help for improving French Language Acquisition for hard of hearing (HOH) chidren or for chidren with language disabilities.

A collection of three digital books has been written by Agnès Piquard-Kipffer and a web interface has been created in order to create others books for language impaired children.

A workflow which transforms a text and an audio source in a video of digital head has been developed. This workflow includes:

An automatic speech alignment has been integrated. This process can retrieve from an acoustic signal and a text transcription, the length and the position of each phoneme and of each word. This allows a synchronization of the articulation of the head with acoustic signal and text display. This technology is a recognition engine, result of a previous work called ESPERE from EPI PAROLE.
A Phonetic transcription designed in the EPI Parole has been integrated and adapted.
A Speech synthetizer has been integrated. This technology can create an artificial voice from a text. It’s a part of tools provided to make a digital book. Several software programs are tested in order to find the best result.
A French cued speech coding and talking head has been improved in order to generate videos on a server. The animation consists in animating a 3D talking head, in association with a 3D hand which can code cued speech. This technology was created from a previous RIAM project called LABIAO.

A digital book written in FLASH has been developed. It integrates videos of the digital head, which are synchronized with texts displayed for each page. Digital books can be created manually with a text editor (to create XML file) or automatically with software which can be easily used to add all necessary multimedia elements in pages.

Data (audio source and text) are provided from a web interface. This web site allows users to create digital books. Through this interface, the books can be easily modified, shared and read. This website has been developed with Symfony (PHP 5 web framework) and AJAX (Dojo toolkit API) technologies. A linguistical study and a case study analysis of the current version of the talking head and of the digital books were conducted in collaboration (for feasability studies), both with the Speech Therapy School of Nancy (with 8 students : Floriane Jacques, Amélie Dumont, Sophie Bardin, Elodie Racine, Claire Nostrenoff, Anaïs Laurenceau, Hélène Thiollier and Marie Gabet) and with National Education with two schools and specialized teachers (Hélène Adam-Piquard and Sylvie Nussbaum).

ANR DOCVACIM

This contract, coordinated by Prof. Rudolph Sock from the Phonetic Institute of Strasbourg (IPS), addresses the exploitation of X-ray moving pictures recorded in Strasbourg in the eighties. Our contribution is the development of tools to process X-ray images in order to build articulatory model [35] . This year we incorporated tools to withdraw jumps in X-ray films, which are due to the driving of the film during recording. We also developed an analysis procedure to delineate velum contours and to analyze its deformations.

ANR ARTIS

This contract started in January 2009 in collaboration with LTCI (Paris), Gipsa-Lab (Grenoble) and IRIT (Toulouse). Its main purpose is the acoustic-to-articulatory inversion of speech signals. Unlike the European project ASPI the approach followed in our group will focus on the use of standard spectra input data, i.e. cepstral vectors. The objective of the project is to develop a demonstrator enabling inversion of speech signals in the domain of second language learning.

This year the work has focused on the development of the inversion infrastructure using cepstral data as input. We checked that the codebook represents the articulatory to acoustic mapping correctly and we also developed the optimization of the bilinear transform in order to make the comparison of natural and synthetic spectra possible.

ANR ViSAC

This ANR Jeunes Chercheurs started in 2009, in collaboration with Magrit group. The main purpose of ViSAC (Acoustic-Visual Speech Synthesis by Bimodal Unit Concatenation) is to propose a new approach of a text-to-acoustic-visual speech synthesis which is able to animate a 3D talking head and to provide the associated acoustic speech. The major originality of this work is to consider the speech signal as bimodal (composed of two channels acoustic and visual) "viewed" from either facet visual or acoustic. The key advantage is to guarantee that the redundancy of two facets of speech, acknowledged as determining perceptive factor, is preserved.

Currently, we designed a complete system of the text to acoustic-visual speech synthesis based on a relatively small corpus. The system is using bimodal diphones (an acoustic component and a visual one) and it is using unit selection techniques. Although the database for the synthesis is small, however the first results seem to be very promising. The developed system can be used with a larger corpus. We are trying to acquire/analyze an 1-2 hours of audiovisual speech. With a larger corpus, the quality of the synthesis will be obviousely much better.

The next year, we will mainly evaluate the system using both subjective and objective perceptual evaluation.

Previous |

Home | Next next